Enabling Complex Wikipedia Queries - Technical Report

نویسندگان

  • Gilad Katz
  • Bracha Shapira
چکیده

In this technical report we present a database schema used to store Wikipedia so it can be easily used in query-intensive applications. In addition to storing the information in a way that makes it highly accessible, our schema enables users to easily formulate complex queries using information such as the anchor-text of links and their location in the page, the titles and number of redirect pages for each page and the paragraph structure of entity pages. We have successfully used the schema in domains such as recommender systems, information retrieval and sentiment analysis. In order to assist other researchers, we now make the schema and its content available online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Knowledge in Semi Structured Data Sets with Rich Queries

Semantics can be integrated in to search processing during both document analysis and querying stages. We describe a system that incorporates both, semantic annotations of Wikipedia articles into the search process and allows for rich annotation search, enabling users to formulate queries based on their knowledge about how entities relate to one another while simultaneously retaining the freedo...

متن کامل

Answering End-User Questions, Queries and Searches on Wikipedia and its History

Knowledge bases (KBs) encoded using RDF triples deliver many benefits to applications and programmers that access the KBs on the web via SPARQL endpoints. In this paper, we describe and compare two user-friendly systems that seek to make the universal knowledge of Web KBs available to users who neither know SPARQL, nor the internals of the KBs. We first describe CANaLI, that lets people enter N...

متن کامل

NTCIR-12 MathIR Task Overview

We present an overview of the NTCIR-12 MathIR Task, dedicated to information access for mathematical content. The MathIR task makes use of two corpora. The first corpus contains excerpts from technical articles in the arXiv, while the second corpus contains English Wikipedia articles. For each corpus, there were two subtasks. Three subtasks contain queries with keywords and formulae (arXiv-main...

متن کامل

A PC Chase

PC stands for path-conjunctive, the name of a class of queries and dependencies that we define over complex values with dictionaries. This class includes the relational conjunctive queries and embedded dependencies, as well as many interesting examples of complex value and oodb queries and integrity constraints. We show that some important classical results on containment, dependency implicatio...

متن کامل

UNT at ImageCLEF 2010: CLIR for Wikipedia Images

This paper presents the results of the team of the University of North Texas in the Wikipedia image retrieval track of Image-CLEF-2010. Our approach is based on performing translation of the French and German image captions to English and using of Language Models for generating our runs. We also explore the use of complex queries by asking two users to manually build queries based on the origin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1508.03298  شماره 

صفحات  -

تاریخ انتشار 2015